Statistical method for building a translation memory

ABSTRACT

A statistical translation memory (TMEM) may be generated by training a translation model with a naturally generated TMEM. A number of tuples may be extracted from each translation pair in the TMEM. The tuples may include a phrase in a source language and a corresponding phrase in a target language. The tuples may also include probability information relating to the phrases generated by the translation model.

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims the benefit of, and incorporates herein,U.S. Provisional Patent Application No. 60/291,852, filed May 17, 2001.

ORIGIN OF INVENTION

[0002] The research and development described in this application weresupported by DARPA-ITO under grant number N66001-00-1-9814. The U.S.Government may have certain rights in the claimed inventions.

BACKGROUND

[0003] Machine translation (MT) concerns the automatic translation ofnatural language sentences from a first language (e.g., French) intoanother language (e.g., English). Systems that perform MT techniques aresaid to “decode” the source language into the target language.

[0004] A statistical MT system that translates French sentences intoEnglish has three components: a language model (LM) that assigns aprobability P(e) to any English string; a translation model (TM) thatassigns a probability P(f|e) to any pair of English and French strings;and a decoder. The decoder may take a previously unseen sentence f andtry to maximize the e that find P(e|f), or equivalently maximizesP(e)·P(f|e).

SUMMARY

[0005] A statistical translation memory (TMEM) may be generated bytraining a translation model with a naturally generated TMEM. A numberof tuples may be extracted from each translation pair in the TMEM. Thetuples may include a phrase in a source language and a correspondingphrase in a target language. The tuples may also include probabilityinformation relating to the phrases generated by the translation model.

[0006] A number of phrases in the target language may be paired with thesame phrase in the source language. The target language phrase havingthe highest probability of correctness may be selected as a translationequivalent. Alternatively, the target language phrase occurring mostfrequently in the extracted phrases may be selected as a translationequivalent.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007]FIG. 1 is a block diagram of a system for generating a statisticaltranslation memory (TMEM).

[0008]FIG. 2 illustrates the results of a stochastic word alignmentoperation.

[0009]FIG. 3 is a flowchart describing a stochastic process thatexplains how a source string can be mapped into a target string.

[0010]FIG. 4 is a flowchart describing an operation for generating astatistical TMEM.

[0011]FIG. 5 is a table including tuples generated from the twosentences in FIG. 2.

[0012]FIG. 6 is a table including examples of phrases from a FTMEM andtheir corresponding correctness judgments

DETAILED DESCRIPTION

[0013]FIG. 1 illustrates a system 100 for generating and storingtranslation pairs in a statistical translation memory (TMEM) 105. Thestatistical TMEM 100 may be used in machine translation (MT) totranslate from a source language (e.g., French) to a target language(e.g., English). The MT system 100 may use the statistical TMEM 105 tofind translation examples that are relevant for translating an unseen(input) sentence, and then modify and integrate translation fragments toproduce correct output sentences.

[0014] A pre-compiled TMEM 110 including naturally-generated translationpairs may be used as the basis of the statistical TMEM 105. For example,for a French/English MT, a TMEM such as the Hansard Corpus, or a portionthereof, may be used. The Hansard Corpus includes parallel texts inEnglish and Canadian French, drawn from official records of theproceedings of the Canadian Parliament. The Hansard Corpus is presentedas a sequences of sentences in a version produced by IBM Corporation.The IBM collection contains nearly 2.87 million parallel sentence pairsin the set.

[0015] Sentence pairs from the corpus may be used to train a statisticalMT module 115. The statistical MT module 115 may implement a translationmodel 120, such as the IBM translation model 4, described in U.S. Pat.No. 5,477,451. The IBM translation model 4 revolves around the notion ofa word alignment over a pair of sentences, such as that shown in FIG. 2.A word alignment assigns a single home (English string position) to eachFrench word. If two French words align to the same English word, thenthat English word is said to have a fertility of two. Likewise, if anEnglish word remains unaligned-to, then it has fertility zero. If a wordhas fertility greater than one, it is called very fertile.

[0016] The word alignment in FIG. 2 is shorthand for a hypotheticalstochastic process by which an English string gets converted into aFrench string. FIG. 3 is a flowchart describing, at a high level, such astochastic process 300. Every English word in the string is firstassigned a fertility (block 305). These assignments may be madestochastically according to a table n(ø|e_(i)). Any word with fertilityzero is deleted from the string, any word with fertility two isduplicated, etc. After each English word in the new string, thefertility of an invisible English NULL element with probability p₁(typically about 0.02) is incremented (block 310). The NULL element mayultimately produce “spurious” French words. A word-for-word replacementof English words (including NULL) by French words is performed,according to the table t(f_(j)|e_(i)) (block 315). Finally, the Frenchwords are permuted (block 320). In permuting, IBM translation model 4distinguishes between French words that are heads (the leftmost Frenchword generated from a particular English word), non-heads (non-leftmost,generated only by very fertile English words), and NULL-generated.

[0017] The head of one English word is assigned a French string positionbased on the position assigned to the previous English word. If anEnglish word E_(e−1) translates into something at French position j,then the French head word of e_(i) is stochastically placed in Frenchposition k with distortion probability d₁(k−j|class (e_(i−1)), class(f_(k))), where “class” refers to automatically determined word classesfor French and English vocabulary items. This relative offset k−jencourages adjacent English words to translate into adjacent Frenchwords. If e_(i−1) is infertile, then j is taken from e_(i−2), etc. Ife_(i−1) is very fertile, then j is the average of the positions of itsFrench translations.

[0018] If the head of English word e_(i) is placed in French position j,then its first non-head is placed in French position k (>j) according toanother table d_(>1)(k−j|class (f_(k))). The next non-head is placed atposition q with probability d₂₂ ₁(q−k|class (f_(q))), and so forth.

[0019] After heads and non-heads are placed, NULL-generated words arepermuted into the remaining vacant slots randomly. If there are Ø₀NULL-generated words, then any placement scheme is chosen withprobability 1/Ø₀!.

[0020] These stochastic decisions, starting with e, result in differentchoices of f and an alignment of f with e. The string e is mapped onto aparticular <a,f> pair with probability:${P\left( {a,{fe}} \right)} = {\prod\limits_{i = 1}^{l}\quad {{n\left( {\varphi_{i}e_{i}} \right)} \times {\prod\limits_{i = 1}^{l}\quad {\prod\limits_{k = 1}^{\varphi_{i}}\quad {{t\left( {\tau_{ik}e_{i}} \right)} \times {\prod\limits_{{i = 1},{\varphi_{1} > 0}}^{l}\quad {{d_{1}\left( {{{\pi_{i1} - c_{pi}}{{class}\left( e_{pi} \right)}},{{class}\left( \tau_{i1} \right)}} \right)} \times {\prod\limits_{i = 1}^{l}\quad {\prod\limits_{k = 2}^{\varphi_{i}}\quad {{d_{> 1}\left( {{\pi_{ik} - \pi_{i{({k - 1})}}}{{class}\left( \tau_{ik} \right)}} \right)} \times \begin{pmatrix}{m - \varphi_{0}} \\\varphi_{0}\end{pmatrix}{p_{1}^{\varphi_{0}}\left( {1 - p_{1}} \right)}^{m - {2\varphi_{0}}} \times {\prod\limits_{k = 1}^{\varphi_{0}}\quad {t\left( {\tau_{0k}{NULL}} \right)}}}}}}}}}}}}$

[0021] where the factors separated by “x” symbols denote fertility,translation, head permutation, non-head permutation, null-fertility, andnull-translation probabilities, respectively. The symbols in thisformula are: l (the length of e), m (the length of f), e_(i) (the i^(th)English word in e), e₀ (the NULL word), ø_(i) (the fertility of e_(i)),ø₀ (the fertility of the NULL word), τ_(ik) (the k^(th) French wordproduced by e_(i) in a), π_(ik) (the position of τ_(ik) in f), ρ_(i)(the position of the first fertile word to the left of e_(i) in a),c_(ρi) (the ceiling of the average of all π_(ρik) for ρ_(i), or 0 ifρ_(i) is undefined)

[0022] In view of the foregoing, given a new sentence f, then an optimaldecoder will search for an e that maximizes P(e|f)>>P(e)·P(f|e). Here,P(f|e) is the sum of P(a,f|e) over all possible alignments a. Becausethis sum involves significant computation, typically it is avoided byinstead searching for an <e,a> pair that maximizesP(e,a|f)>>P(e)·P(a,f|e). It is assumed that the language model P(e) is asmoothed n-gram model of English.

[0023]FIG. 4 is a flowchart describing an operation 400 for generating astatistical TMEM according to an embodiment. The translation model 120is trained with an existing TMEM 110 (block 405). After training thetranslation model 120, an extraction module 125 may use the Viterbi(most probable word level) alignment of each sentence, i.e., thealignment of highest probability, to extract tuples of the form <e_(i),e_(i+1), . . . , e_(i+k); f_(j), f_(j+1), . . . , f_(j+1); a_(j),a_(j+1), . . . , a_(j+1)> (block 410), where e_(i), e_(i+1), . . . ,e_(i+k) represents a contiguous English phrase, f_(j), f_(j+1), . . . ,f_(j+1) represents a contiguous French phrase, and a_(j), a_(j+1), . . ., a_(j+1)> represents the Viterbi alignment between the two phrases. Forexample, in the Viterbi alignment of the two sentences in FIG. 2, whichwas produced automatically, “there” and “.” are words of fertility 0,NULL generates the French lexeme “.”,“is” generates “est”, “no”generates “aucun” and “ne”, and so on.

[0024] When a different translation model is used, the TMEM may containin addition to the contiguous French/English phrase adjacent informationspecific to the translation model that is employed.

[0025] The tuples may be selected based on certain criteria. The tuplesmay be limited to “contiguous” alignments, i.e., alignments in which thewords in the English phrase generated only words in the French phraseand each word in the French phrase was generated either by the NULL wordor a word from the English phrase. The tuples may be limited to those inwhich the English and French phrases contained at least two words. Thetuples may be limited to those that occur most often in the data. Basedon these conditions, six tuples may be extracted from the two sentencesin FIG. 2, as shown in FIG. 5.

[0026] Extracting all tuples of the form <e; f; a> from the trainingcorpus may produce French phrases that were paired with multiple Englishtranslations. To reduce ambiguity, one possible English translationequivalent may be chosen for each French phrase (block 415). Differentmethods for choosing a translation equivalent may be used to constructdifferent probabilistic TMEMs (block 420). A Frequency-based TranslationMemory (FTMEM) may be created by associating with each French phrase theEnglish equivalent that occurred most often in the collection of phrasesthat were extracted. A Probability-based Translation Memory (PTMEM) maybe created by associating with each French phrase the English equivalentthat corresponded to the alignment of highest probability.

[0027] The exemplary statistical TMEMs explicitly encode not only themutual translation pairs but also their corresponding word-levelalignments, which may be derived according to a certain translationmodel, e.g., IBM translation model 4. The mutual translations may beanywhere between two words long to complete sentences. In an exemplarystatistical TMEM generation process, an FTMEM and a PTMEM were generatedfrom a training corpus of 500,000 sentence pairs of the Hansard Corpus.Both methods yielded translation memories that contained around 11.8million word-aligned translation pairs. Due to efficiencyconsiderations, only a fraction of the TMEMs were used, i.e., those thatcontained phrases at most 10 words long. This yielded a working FTMEM of4.1 million and a working PTMEM of 5.7 million phrase translation pairsaligned at the word level using the IBM statistical model 4.

[0028] To evaluate the quality of both TMEMs, two hundred phrase pairswere randomly extracted from each TMEM. These phrases were judged by abilingual speaker as perfect, almost perfect, or incorrect translation.A phrase was considered a perfect translations if the judge couldimagine contexts in which the aligned phrases could be mutualtranslations of each other. A phrase was considered an almost perfecttranslation if the aligned phrases were mutual translations of eachother and one phrase contained one single word with no equivalent in theother language. For example, the translation pair “final, le secrétairede” and “final act, the secretary of” were labeled as almost perfectbecause the English word “act” has no French equivalent. A translationwas considered an incorrect translations if the judge could not imagineany contexts in which the aligned phrases could be mutual translationsof each other.

[0029] The results of the evaluation are shown in FIG. 6. A visualinspection of the phrases in the TMEMs and the judgments made by theevaluator suggest that many of the translations labeled as incorrectmake sense when assessed in a larger context. For example, “autresrégions de le pays que” and “other parts of Canada than” were judged asincorrect. However, when considered in a context in which it is clearthat “Canada” and “pays” corefer, it would be reasonable to assume thatthe translation is correct.

[0030]FIG. 6 shows a few examples of phrases from the FTMEM and theircorresponding correctness judgments. These judgments may be stored withthe FTMEM to be used during translation. These judgments may be usedwhen matching translation pairs to identify good matches (correcttranslations), less good matches (almost correct translation), andmatches to avoid or reject (incorrect translations).

[0031] A number of embodiments have been described. Nevertheless, itwill be understood that various modifications may be made withoutdeparting from the spirit and scope of the invention. For example,blocks in the flowcharts may be skipped or performed out of order andstill produce desirable results. Accordingly, other embodiments arewithin the scope of the following claims.

1. A method comprising: training a translation model with a plurality oftranslation pairs, each translation pair including a text segment in asource language and a corresponding text segment in a target language;generating a plurality of tuples from each of a plurality of saidtranslation pairs, each tuple comprising a phrase in the sourcelanguage, a phrase in the target language, and probability informationrelating to said phrases; and storing the tuples in a statisticaltranslation memory.
 2. The method of claim 1, wherein the probabilityinformation relating to said phrases comprises alignment information. 3.The method of claim 1, wherein said generating comprises pairing aplurality of phrases in the target language with a phrase in the sourcelanguage.
 4. The method of claim 3, further comprising: selecting onetranslation equivalent from said plurality of phrases in the targetlanguage; and associating said translation equivalent with the phrase inthe source language.
 5. The method of claim 4, wherein said selectingcomprises selecting a phrase occurring most frequently in the extractedphrases.
 6. The method of claim 4, wherein said selecting comprisesselecting a phrase having a highest probability of being a correcttranslation of said phrase in the source language.
 7. The method ofclaim 1, further comprising: judging a correctness of a plurality ofsaid tuples; and selecting a tuple in a translation operation inresponse to said judgment.
 8. A statistical translation memorycomprising: a plurality of tuples extracted from a plurality oftranslation pairs in a corpus, each tuple including a text segment in asource language and a corresponding text segment in a target language.9. The statistical translation memory of claim 8, wherein each tuplefurther comprises alignment information relating to the text segments insaid each tuple.
 10. The statistical translation memory of claim 8,wherein the text segment in the target language in each of a pluralityof tuples is selected from a plurality of text segments in the targetlanguage extracted from the corpus.
 11. The statistical translationmemory of claim 10, wherein said selected text segment has been selectedbased on a calculated probability of correctness.
 12. The statisticaltranslation memory of claim 10, wherein said selected text segment hasbeen selected based on a frequency of occurrence in the extracted textsegments.
 13. Apparatus comprising: a translation model operative toassign a probability to each of a plurality of translation pairs, eachtranslation pair including a text segment in a source language and acorresponding text segment in a target language; an extraction moduleoperative to extract a plurality of tuples from each of a plurality ofsaid translation pairs, each tuple comprising a phrase in the sourcelanguage, a phrase in the target language, and probability informationrelating to said phrases; and a statistical translation memory operativeto store the tuples.
 14. The apparatus of claim 13, wherein theprobability information relating to said phrases comprises alignmentinformation.
 15. The apparatus of claim 13, wherein said the extractionmodule is further operative to pair a plurality of phrases in the targetlanguage with a phrase in the source language.
 16. The apparatus ofclaim 15, wherein the extraction unit is further operative to select onetranslation equivalent from said plurality of phrases and associate saidtranslation equivalent with the phrase in the source language.
 17. Theapparatus of claim 16, wherein the extraction unit is operative toselect a phrase occurring most frequently in the extracted phrases fromsaid plurality of phrases.
 18. The apparatus of claim 16, wherein theextraction unit is operative to select a phrase having a highestprobability of being a correct translation of said phrase in the sourcelanguage.
 19. The apparatus of claim 16, wherein the extraction unit isoperative to select a phrase having an alignment of highest probabilitywith said phrase in the source language.
 20. An article comprising amachine-readable medium including instructions operative to cause amachine to: train a translation model with a plurality of translationpairs, each translation pair including a text segment in a sourcelanguage and a corresponding text segment in a target language; generatea plurality of tuples from each of a plurality of said translationpairs, each tuple comprising a phrase in the source language, a phrasein the target language, and probability information relating to saidphrases; and store the tuples in a statistical translation memory. 21.The article of claim 19, wherein the probability information relating tosaid phrases comprises alignment information.
 22. The article of claim20, wherein said generating comprises pairing a plurality of phrases inthe target language with a phrase in the source language.
 23. Thearticle of claim 22, further comprising: selecting one translationequivalent from said plurality of phrases in the target language; andassociating said translation equivalent with the phrase in the sourcelanguage.
 24. The article of claim 23, wherein said selecting comprisesselecting a phrase occurring most frequently in the extracted phrases.25. The article of claim 23, wherein said selecting comprises selectinga phrase having a highest probability of being a correct translation ofsaid phrase in the source language.
 26. The article of claim 23, whereinselecting comprises selecting a phrase having an alignment of highestprobability with said phrase in the source language.